Curling Sports Analytics

Basics of Curling

If you are unfamiliar with curling, the World Curling Federation has a nice 2-minute intro to the sport:

In curling, two teams of four players take turns sliding rocks down a sheet of ice toward a target painted on the far end of the ice (called the house or rings), sweeping the ice in from of the rock to control its path. Each match has series of scoring opportunities called ends (similar to innings in baseball), typically 8 or 10 ends, and the team with the most points at the conclusion of the match wins. Unlike most sports, teams often concede early if they are too far behind to have a realistic chance of winning.

In each end, the teams take turns sliding rocks until each team has delivered 8 rocks. The team that delivers the last rock (called hammer) has a big advantage in scoring, since they have the last chance to rearrange rocks to maximize their score.

The score for the end is based on the placement of the rocks after all rocks are delivered. If no rocks are in the house, both teams score zero points. If at least one rock is touching the house, the team with the rock closest to the center of the rings scores one point for each of their rocks that is closer to the center than the nearest opponent rock. For example, if the closest rock to the center is red and the second closest is yellow, red scores 1 point. If the closest three rocks to the center are all yellow and the fourth closest is red, then yellow scores 3 points.

Scoring in curling

The common types of scores for one end are:

If the end is blanked, the team with hammer keeps it in the next end. If one team scores, the team that did not score gets hammer in the next end. The team with hammer will usually try to convert (maximizing their score) or blank (saving their advantage for a better opportunity later), and the team without hammer will usually try to steal or force.

Database Details

CurlingZone maintains a database of the line scores from matches in most major events. Since the website does not allow API access, I used webscraping to construct my own SQL database of the line scores (scores in each end) of over 170,000 matches, along with basic event and team information. The analysis below is based on that database.

The database contains four tables:

  1. Matches: This is the main table that stores information about each match, including when it happened, who played, and the line scores.
  2. Events: This table includes names and dates of each curling event. Events vary in size, but most include ~10-100 matches.
  3. Teams: This table includes the world ranking points for mens and womens teams for each season from 2011-2024, as well as whether the team is a mens or womens team. (Curling also has formats like mixed doubles, wheelchair, junior men/women, etc., which are not currently included in this table.)
  4. Updates: This table lists when the update_database script has been run, which lets that script only update matches in events that happened since the last update.

The detailed structure of each table is listed below:

Column Description
Matches
EID ID number of event that the match was part of (foreign key for events)
Draw Time slot (draw) of the match within the event
Team1 Name of first team playing in the match (foreign key for teams)
Team2 Name of second team playing in the match (foreign key for teams)
Final1 Final score of team 1
Final2 Final score of team 2
Ham1 Indicates which team had hammer in End 1 to start the game. 1 for Team 1, -1 for Team 2, Null for unknown.
End1 Score in End 1. Positive means Team 1 scored, negative means Team 2 scored, zero means neither team scored, X or Null means the game was over or the score is unknown.
End2, End3, ... Columns for Ends 1-12 are included, analogous to End1.
Events
EID ID number of the event
Name Name of the event, in words
Date Start date of the event. Events usually last 1-12 days.
Type Identifies the events as mens, womens, etc.
Teams
Name Name of the team, usually the name of the team's skip (captain).
Type Identifies the team as mens, womens, etc.
Location Country, state, or province where the team was most recently based.
World2011 World ranking points the team earned in the 2010-2011 season.
World2012, World 2013, ... Same as above, for seasons through 2023-2024.
Updates
date Date when the update_database script was run.

Because CurlingZone has become more popular over time, 94% of the matches in my webscraped database happened since the 2010-2011 season. In my analyses, I chose to focus mainly on data aggregated from the 2010-2011 season to the present. The rules and strategy in curling have undergone some minor changes since 2010, but the changes have generally not dramatically changed the balance of the game.

As a webscraped database, the data is naturally messy due to data entry errors, inconsistencies in formatting from year to year, etc. I chose to leave the raw webscraped data in the database and perform cleaning before each analysis run, rather than permanently altering the table to only include cleaned data. When running the analysis, I temporarily remove matches with the following errors:

This removes ~6200 matches from the databases, out of >178,000. That means the rate of incomplete or obviously incorrect data is ~3.5%, with the majority of the matches that are removed being because of incomplete data.

Much of the analysis below categorizes the matches by mens/womens, match length, and team rankings. Since none of those features are stored permanently in matches, they need to be computed by joining with other tables. For example, finding matches between two top-25 womens teams requires one join with the events table to find the season for each match, one join with teams to find the gender and season rank of Team 1, and a second join with teams to find the gender and season rank of Team 2. This is possible to do, but in practice collecting the results over 14 seasons for ~30 combinations of gender, match length, and rank is slow to run.

To speed up the calculations, I chose to use joins to pre-compute the features needed to categorize the matches and temporarily add those features to the matches table. To minimize the permanent size of the database, I chose not to store those features permanently. The pre-computed features are:

Timing of Matches

The graph below shows the number of curling matches that took place each month, and can be filtered by mens/womens, match length in ends, and ranks of the two teams.

This graph shows the following trends:

Distribution of Final Scores

The graph below shows the number of curling matches with a given final score since the start of the 2010-2011 season, and can be filtered by mens/womens, match length in ends, and ranks of the two teams.

This graph shows the following trends:

Distribution of Scores in Each End

The graph below shows the distribution of scores in each end of a curling match, and can be filtered by mens/womens, match length in ends, and ranks of the two teams. Positive scores mean that the team with hammer scored, and negative scores mean that the team without hammer scored (stole).

This graph shows the following trends:

Testing the Common Wisdom: Having hammer in even ends is a big advantage

In high-level curling, most players and commentators think that having the advantage of hammer (last rock) in even ends is a big advantage late in the game. The team with hammer has more control over the outcome of the end and is much more likely to score.

For example, a team that is down 2 points with hammer in end 8 of 10 could score 2 in end 8 (tieing the score), force their opponent to score 1 point in end 9 (going down by one), and then score 2 in end 10 to win by 1 point. In contrast, a team that is down by even 1 point without hammer in end 8 of 10 has a trickier route to winning the game: they need to either score 3 points when they have hammer or steal at least 1 point to have a chance of winning.

The graph below shows the probability of Team 1 winning a curling match, and can be filtered by mens/womens, match length in ends, and ranks of the two teams. Positive margins mean that Team 1 is ahead, and negative margins means that Team 1 is behind.

This graph shows the following trends: